Human-Computer Interaction
☆ A Sociotechnical Framework For Addressing Stigma and Designing Personalized Digital Health Products
Stigma, a recognized global barrier to effective disease management, impacts
social interactions, resource access, and psychological well-being. In this
study, we developed a patient-centered framework for deriving design
requirements and interventions for health conditions subject to social stigma.
This study introduces a patient-centered framework, grounded in sociotechnical
systems theory, to create tailored interventions and design requirements for
health conditions influenced by social stigma. We tested this framework through
a mixed-method study on chronic pelvic pain patients. Our approach led to the
identification of ten design requirements that encompass behavioral and
psychological support and strategies for day-to-day living. The findings reveal
a preference among CPP patients for priming and social support interventions.
This study underscores the value of a systems-based perspective in healthcare,
advocating for a nuanced, patient-centered approach that addresses the complex
nature of health conditions affected by social stigma. It contributes to the
ongoing discourse on integrating STS theory into healthcare frameworks,
highlighting the need for targeted strategies to combat the complexities of
stigma in patient care.
comment: 19 pages, 5 Tables, 1 Figure
☆ Towards Inclusive Video Commenting: Introducing Signmaku for the Deaf and Hard-of-Hearing CHI 2024
Si Chen, Haocong Cheng, Jason Situ, Desirée Kirst, Suzy Su, Saumya Malhotra, Lawrence Angrave, Qi Wang, Yun Huang
Previous research underscored the potential of danmaku--a text-based
commenting feature on videos--in engaging hearing audiences. Yet, for many Deaf
and hard-of-hearing (DHH) individuals, American Sign Language (ASL) takes
precedence over English. To improve inclusivity, we introduce "Signmaku," a new
commenting mechanism that uses ASL, serving as a sign language counterpart to
danmaku. Through a need-finding study (N=12) and a within-subject experiment
(N=20), we evaluated three design styles: real human faces, cartoon-like
figures, and robotic representations. The results showed that cartoon-like
signmaku not only entertained but also encouraged participants to create and
share ASL comments, with fewer privacy concerns compared to the other designs.
Conversely, the robotic representations faced challenges in accurately
depicting hand movements and facial expressions, resulting in higher cognitive
demands on users. Signmaku featuring real human faces elicited the lowest
cognitive load and was the most comprehensible among all three types. Our
findings offered novel design implications for leveraging generative AI to
create signmaku comments, enriching co-learning experiences for DHH
individuals.
comment: 14 pages, CHI 2024
☆ SciCapenter: Supporting Caption Composition for Scientific Figures with Machine-Generated Captions and Ratings CHI
Ting-Yao Hsu, Chieh-Yang Huang, Shih-Hong Huang, Ryan Rossi, Sungchul Kim, Tong Yu, C. Lee Giles, Ting-Hao K. Huang
Crafting effective captions for figures is important. Readers heavily depend
on these captions to grasp the figure's message. However, despite a
well-developed set of AI technologies for figures and captions, these have
rarely been tested for usefulness in aiding caption writing. This paper
introduces SciCapenter, an interactive system that puts together cutting-edge
AI technologies for scientific figure captions to aid caption composition.
SciCapenter generates a variety of captions for each figure in a scholarly
article, providing scores and a comprehensive checklist to assess caption
quality across multiple critical aspects, such as helpfulness, OCR mention, key
takeaways, and visual properties reference. Users can directly edit captions in
SciCapenter, resubmit for revised evaluations, and iteratively refine them. A
user study with Ph.D. students indicates that SciCapenter significantly lowers
the cognitive load of caption writing. Participants' feedback further offers
valuable design insights for future systems aiming to enhance caption writing.
comment: CHI EA '24: Extended Abstracts of the 2024 CHI Conference on Human
Factors in Computing Systems
☆ Exploring the Boundaries of Ambient Awareness in Twitter
Ambient awareness refers to the ability of social media users to obtain
knowledge about who knows what (i.e., users' expertise) in their network, by
simply being exposed to other users' content (e.g, tweets on Twitter). Previous
work, based on user surveys, reveals that individuals self-report ambient
awareness only for parts of their networks. However, it is unclear whether it
is their limited cognitive capacity or the limited exposure to diagnostic
tweets (i.e., online content) that prevents people from developing ambient
awareness for their complete network. In this work, we focus on in-wall ambient
awareness (IWAA) in Twitter and conduct a two-step data-driven analysis, that
allows us to explore to which extent IWAA is likely, or even possible. First,
we rely on reactions (e.g., likes), as strong evidence of users being aware of
experts in Twitter. Unfortunately, such strong evidence can be only measured
for active users, which represent the minority in the network. Thus to study
the boundaries of IWAA to a larger extent, in the second part of our analysis,
we instead focus on the passive exposure to content generated by other users --
which we refer to as in-wall visibility. This analysis shows that (in line with
\citet{levordashka2016ambient}) only for a subset of users IWAA is plausible,
while for the majority it is unlikely, if even possible, to develop IWAA. We
hope that our methodology paves the way for the emergence of data-driven
approaches for the study of ambient awareness.
☆ FastPerson: Enhancing Video Learning through Effective Video Summarization that Preserves Linguistic and Visual Contexts
Quickly understanding lengthy lecture videos is essential for learners with
limited time and interest in various topics to improve their learning
efficiency. To this end, video summarization has been actively researched to
enable users to view only important scenes from a video. However, these studies
focus on either the visual or audio information of a video and extract
important segments in the video. Therefore, there is a risk of missing
important information when both the teacher's speech and visual information on
the blackboard or slides are important, such as in a lecture video. To tackle
this issue, we propose FastPerson, a video summarization approach that
considers both the visual and auditory information in lecture videos.
FastPerson creates summary videos by utilizing audio transcriptions along with
on-screen images and text, minimizing the risk of overlooking crucial
information for learners. Further, it provides a feature that allows learners
to switch between the summary and original videos for each chapter of the
video, enabling them to adjust the pace of learning based on their interests
and level of understanding. We conducted an evaluation with 40 participants to
assess the effectiveness of our method and confirmed that it reduced viewing
time by 53\% at the same level of comprehension as that when using traditional
video playback methods.
☆ Evaluating Authoring Tools with the Explorable Authoring Requirements SC
Explorables with interactive, multimodal content, openly available on the
web, are a promising medium for education. Yet authoring such explorables
requires web development expertise, excluding most educators and students from
the authoring and remixing process. Some tools are available to reduce this
barrier of entry and others are in development, making a method to evaluate
these new tools necessary. On the basis of the software quality model ISO
25010, empirical results, and domain modeling, we derive the Explorable
Authoring Requirements (EAR) as a requirements catalogue explorable authoring
tools should implement. We then outline a future research design to
operationalize EAR.
comment: 12 pages plus references, preprint of paper for NaKoDi2022 (adjunct
to WiPSCE 2022 conference)
☆ Panonut360: A Head and Eye Tracking Dataset for Panoramic Video ACM MM
With the rapid development and widespread application of VR/AR technology,
maximizing the quality of immersive panoramic video services that match users'
personal preferences and habits has become a long-standing challenge.
Understanding the saliency region where users focus, based on data collected
with HMDs, can promote multimedia encoding, transmission, and quality
assessment. At the same time, large-scale datasets are essential for
researchers and developers to explore short/long-term user behavior patterns
and train AI models related to panoramic videos. However, existing panoramic
video datasets often include low-frequency user head or eye movement data
through short-term videos only, lacking sufficient data for analyzing users'
Field of View (FoV) and generating video saliency regions.
Driven by these practical factors, in this paper, we present a head and eye
tracking dataset involving 50 users (25 males and 25 females) watching 15
panoramic videos. The dataset provides details on the viewport and gaze
attention locations of users. Besides, we present some statistics samples
extracted from the dataset. For example, the deviation between head and eye
movements challenges the widely held assumption that gaze attention decreases
from the center of the FoV following a Gaussian distribution. Our analysis
reveals a consistent downward offset in gaze fixations relative to the FoV in
experimental settings involving multiple users and videos. That's why we name
the dataset Panonut, a saliency weighting shaped like a donut. Finally, we also
provide a script that generates saliency distributions based on given head or
eye coordinates and pre-generated saliency distribution map sets of each video
from the collected eye tracking data.
The dataset is available on website: https://dianvrlab.github.io/Panonut360/.
comment: 7 pages,ACM MMSys'24 accepted
☆ ExpressEdit: Video Editing with Natural Language and Sketching
Informational videos serve as a crucial source for explaining conceptual and
procedural knowledge to novices and experts alike. When producing informational
videos, editors edit videos by overlaying text/images or trimming footage to
enhance the video quality and make it more engaging. However, video editing can
be difficult and time-consuming, especially for novice video editors who often
struggle with expressing and implementing their editing ideas. To address this
challenge, we first explored how multimodality$-$natural language (NL) and
sketching, which are natural modalities humans use for expression$-$can be
utilized to support video editors in expressing video editing ideas. We
gathered 176 multimodal expressions of editing commands from 10 video editors,
which revealed the patterns of use of NL and sketching in describing edit
intents. Based on the findings, we present ExpressEdit, a system that enables
editing videos via NL text and sketching on the video frame. Powered by LLM and
vision models, the system interprets (1) temporal, (2) spatial, and (3)
operational references in an NL command and spatial references from sketching.
The system implements the interpreted edits, which then the user can iterate
on. An observational study (N=10) showed that ExpressEdit enhanced the ability
of novice video editors to express and implement their edit ideas. The system
allowed participants to perform edits more efficiently and generate more ideas
by generating edits based on user's multimodal edit commands and supporting
iterations on the editing commands. This work offers insights into the design
of future multimodal interfaces and AI-based pipelines for video editing.
comment: 22 pages, 5 figures, to be published in ACM IUI 2024
☆ Coimagining the Future of Voice Assistants with Cultural Sensitivity
Voice assistants (VAs) are becoming a feature of our everyday life. Yet, the
user experience (UX) is often limited, leading to underuse, disengagement, and
abandonment. Co-designing interactions for VAs with potential end-users can be
useful. Crowdsourcing this process online and anonymously may add value.
However, most work has been done in the English-speaking West on dialogue data
sets. We must be sensitive to cultural differences in language, social
interactions, and attitudes towards technology. Our aims were to explore the
value of co-designing VAs in the non-Western context of Japan and demonstrate
the necessity of cultural sensitivity. We conducted an online elicitation study
(N = 135) where Americans (n = 64) and Japanese people (n = 71) imagined
dialogues (N = 282) and activities (N = 73) with future VAs. We discuss the
implications for coimagining interactions with future VAs, offer design
guidelines for the Japanese and English-speaking US contexts, and suggest
opportunities for cultural plurality in VA design and scholarship.
comment: 21 pages
☆ DiffGaze: A Diffusion Model for Continuous Gaze Sequence Generation on 360° Images
We present DiffGaze, a novel method for generating realistic and diverse
continuous human gaze sequences on 360{\deg} images based on a conditional
score-based denoising diffusion model. Generating human gaze on 360{\deg}
images is important for various human-computer interaction and computer
graphics applications, e.g. for creating large-scale eye tracking datasets or
for realistic animation of virtual humans. However, existing methods are
limited to predicting discrete fixation sequences or aggregated saliency maps,
thereby neglecting crucial parts of natural gaze behaviour. Our method uses
features extracted from 360{\deg} images as condition and uses two transformers
to model the temporal and spatial dependencies of continuous human gaze. We
evaluate DiffGaze on two 360{\deg} image benchmarks for gaze sequence
generation as well as scanpath prediction and saliency prediction. Our
evaluations show that DiffGaze outperforms state-of-the-art methods on all
tasks on both benchmarks. We also report a 21-participant user study showing
that our method generates gaze sequences that are indistinguishable from real
human sequences.
☆ Cognitively Biased Users Interacting with Algorithmically Biased Results in Whole-Session Search on Controversial Topics
When interacting with information retrieval (IR) systems, users, affected by
confirmation biases, tend to select search results that confirm their existing
beliefs on socially significant contentious issues. To understand the judgments
and attitude changes of users searching online, our study examined how
cognitively biased users interact with algorithmically biased search engine
result pages (SERPs). We designed three-query search sessions on debated topics
under various bias conditions. We recruited 1,321 crowdsourcing participants
and explored their attitude changes, search interactions, and the effects of
confirmation bias. Three key findings emerged: 1) most attitude changes occur
in the initial query of a search session; 2) confirmation bias and result
presentation on SERPs affect search behaviors in the current query and
perceived familiarity with clicked results in subsequent queries. The bias
position also affect attitude changes of users with lower perceived openness to
conflicting opinions; 3) Interactions in the first query and and dwell time
throughout the session are associated with users' attitude changes in different
forms. Our study goes beyond traditional simulation-based evaluation settings
and simulated rational users, sheds light on the mixed effects of human biases
and algorithmic biases in controversial information retrieval tasks, and can
inform the design of bias-aware user models, human-centered bias mitigation
techniques, and socially responsible intelligent IR systems.
♻ ☆ As Good As A Coin Toss: Human detection of AI-generated images, videos, audio, and audiovisual stimuli
As synthetic media becomes progressively more realistic and barriers to using
it continue to lower, the technology has been increasingly utilized for
malicious purposes, from financial fraud to nonconsensual pornography. Today,
the principal defense against being misled by synthetic media relies on the
ability of the human observer to visually and auditorily discern between real
and fake. However, it remains unclear just how vulnerable people actually are
to deceptive synthetic media in the course of their day to day lives. We
conducted a perceptual study with 1276 participants to assess how accurate
people were at distinguishing synthetic images, audio only, video only, and
audiovisual stimuli from authentic. To reflect the circumstances under which
people would likely encounter synthetic media in the wild, testing conditions
and stimuli emulated a typical online platform, while all synthetic media used
in the survey was sourced from publicly accessible generative AI technology.
We find that overall, participants struggled to meaningfully discern between
synthetic and authentic content. We also find that detection performance
worsens when the stimuli contains synthetic content as compared to authentic
content, images featuring human faces as compared to non face objects, a single
modality as compared to multimodal stimuli, mixed authenticity as compared to
being fully synthetic for audiovisual stimuli, and features foreign languages
as compared to languages the observer is fluent in. Finally, we also find that
prior knowledge of synthetic media does not meaningfully impact their detection
performance. Collectively, these results indicate that people are highly
susceptible to being tricked by synthetic media in their daily lives and that
human perceptual detection capabilities can no longer be relied upon as an
effective counterdefense.
comment: For study pre-registration, see https://osf.io/fnhr3
♻ ☆ App Planner: Utilizing Generative AI in K-12 Mobile App Development Education
App Planner is an interactive support tool for K-12 students, designed to
assist in creating mobile applications. By utilizing generative AI, App Planner
helps students articulate the problem and solution through guided conversations
via a chat-based interface. It assists them in brainstorming and formulating
new ideas for applications, provides feedback on those ideas, and stimulates
creative thinking. Here we report usability tests from our preliminary study
with high-school students who appreciated App Planner for aiding the app design
process and providing new viewpoints on human aspects especially the potential
negative impact of their creation.
comment: 7 pages, 1 figure, 1 table
♻ ☆ A Design Space for Intelligent and Interactive Writing Assistants CHI 2024
Mina Lee, Katy Ilonka Gero, John Joon Young Chung, Simon Buckingham Shum, Vipul Raheja, Hua Shen, Subhashini Venugopalan, Thiemo Wambsganss, David Zhou, Emad A. Alghamdi, Tal August, Avinash Bhat, Madiha Zahrah Choksi, Senjuti Dutta, Jin L. C. Guo, Md Naimul Hoque, Yewon Kim, Simon Knight, Seyed Parsa Neshaei, Agnia Sergeyuk, Antonette Shibani, Disha Shrivastava, Lila Shroff, Jessi Stark, Sarah Sterman, Sitong Wang, Antoine Bosselut, Daniel Buschek, Joseph Chee Chang, Sherol Chen, Max Kreminski, Joonsuk Park, Roy Pea, Eugenia H. Rho, Shannon Zejiang Shen, Pao Siangliulue
In our era of rapid technological advancement, the research landscape for
writing assistants has become increasingly fragmented across various research
communities. We seek to address this challenge by proposing a design space as a
structured way to examine and explore the multidimensional space of intelligent
and interactive writing assistants. Through a large community collaboration, we
explore five aspects of writing assistants: task, user, technology,
interaction, and ecosystem. Within each aspect, we define dimensions (i.e.,
fundamental components of an aspect) and codes (i.e., potential options for
each dimension) by systematically reviewing 115 papers. Our design space aims
to offer researchers and designers a practical tool to navigate, comprehend,
and compare the various possibilities of writing assistants, and aid in the
envisioning and design of new writing assistants.
comment: Published as a conference paper at CHI 2024
♻ ☆ A Critical Reflection on the Use of Toxicity Detection Algorithms in Proactive Content Moderation Systems
Toxicity detection algorithms, originally designed with reactive content
moderation in mind, are increasingly being deployed into proactive end-user
interventions to moderate content. Through a socio-technical lens and focusing
on contexts in which they are applied, we explore the use of these algorithms
in proactive moderation systems. Placing a toxicity detection algorithm in an
imagined virtual mobile keyboard, we critically explore how such algorithms
could be used to proactively reduce the sending of toxic content. We present
findings from design workshops conducted with four distinct stakeholder groups
and find concerns around how contextual complexities may exasperate
inequalities around content moderation processes. Whilst only specific user
groups are likely to directly benefit from these interventions, we highlight
the potential for other groups to misuse them to circumvent detection, validate
and gamify hate, and manipulate algorithmic models to exasperate harm.
♻ ☆ Learning User Embeddings from Human Gaze for Personalised Saliency Prediction
Reusable embeddings of user behaviour have shown significant performance
improvements for the personalised saliency prediction task. However, prior
works require explicit user characteristics and preferences as input, which are
often difficult to obtain. We present a novel method to extract user embeddings
from pairs of natural images and corresponding saliency maps generated from a
small amount of user-specific eye tracking data. At the core of our method is a
Siamese convolutional neural encoder that learns the user embeddings by
contrasting the image and personal saliency map pairs of different users.
Evaluations on two public saliency datasets show that the generated embeddings
have high discriminative power, are effective at refining universal saliency
maps to the individual users, and generalise well across users and images.
Finally, based on our model's ability to encode individual user
characteristics, our work points towards other applications that can benefit
from reusable embeddings of gaze behaviour.
♻ ☆ Towards Massive Interaction with Generalist Robotics: A Systematic Review of XR-enabled Remote Human-Robot Interaction Systems
The rising interest of generalist robots seek to create robots with
versatility to handle multiple tasks in a variety of environments, and human
will interact with such robots through immersive interfaces. In the context of
human-robot interaction (HRI), this survey provides an exhaustive review of the
applications of extended reality (XR) technologies in the field of remote HRI.
We developed a systematic search strategy based on the PRISMA methodology. From
the initial 2,561 articles selected, 100 research papers that met our inclusion
criteria were included. We categorized and summarized the domain in detail,
delving into XR technologies, including augmented reality (AR), virtual reality
(VR), and mixed reality (MR), and their applications in facilitating intuitive
and effective remote control and interaction with robotic systems. The survey
highlights existing articles on the application of XR technologies, user
experience enhancement, and various interaction designs for XR in remote HRI,
providing insights into current trends and future directions. We also
identified potential gaps and opportunities for future research to improve
remote HRI systems through XR technology to guide and inform future XR and
robotics research.
♻ ☆ Data journeys in popular science: Producing climate change and COVID-19 data visualizations at Scientific American
Vast amounts of (open) data are increasingly used to make arguments about
crisis topics such as climate change and global pandemics. Data visualizations
are central to bringing these viewpoints to broader publics. However,
visualizations often conceal the many contexts involved in their production,
ranging from decisions made in research labs about collecting and sharing data
to choices made in editorial rooms about which data stories to tell. In this
paper, we examine how data visualizations about climate change and COVID-19 are
produced in popular science magazines, using Scientific American, an
established English-language popular science magazine, as a case study. To do
this, we apply the analytical concept of data journeys (Leonelli, 2020) in a
mixed methods study that centers on interviews with Scientific American staff
and is supplemented by a visualization analysis of selected charts. In
particular, we discuss the affordances of working with open data, the role of
collaborative data practices, and how the magazine works to counter
misinformation and increase transparency. This work provides an empirical
contribution by providing insight into the data (visualization) practices of
science communicators and demonstrating how the concept of data journeys can be
used as an analytical framework.
comment: 44 pages, 4 figures, 3 boxes
♻ ☆ The opportunities and risks of large language models in mental health
Hannah R. Lawrence, Renee A. Schneider, Susan B. Rubin, Maja J. Mataric, Daniel J. McDuff, Megan Jones Bell
Global rates of mental health concerns are rising and there is increasing
realization that existing models of mental healthcare will not adequately
expand to meet the demand. With the emergence of large language models (LLMs)
has come great optimism regarding their promise to create novel, large-scale
solutions to support mental health. Despite their nascence, LLMs have already
been applied to mental health-related tasks. In this review, we summarize the
extant literature on efforts to use LLMs to provide mental health education,
assessment, and intervention and highlight key opportunities for positive
impact in each area. We then highlight risks associated with LLMs application
to mental health and encourage adoption of strategies to mitigate these risks.
The urgent need for mental health support must be balanced with responsible
development, testing, and deployment of mental health LLMs. Especially critical
is ensuring that mental health LLMs are fine-tuned for mental health, enhance
mental health equity, adhere to ethical standards, and that people, including
those with lived experience with mental health concerns, are involved in all
stages from development through deployment. Prioritizing these efforts will
minimize potential harms to mental health and maximize the likelihood that LLMs
will positively impact mental health globally.
comment: 12 pages, 2 tables, 4 figures